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Abstract 

Intergenic regions of prokaryotic genomes carry multiple copies of terminal inverted repeat (TIR) sequences, the 
nonautonomous miniature inverted-repeat transposable element (MITE). In addition, there are the repetitive extragenic 
palindromic (REP) sequences that fold into a small stem loop rich in G-C bonding. And the clustered regularly interspaced short 
palindromic repeats (CRISPRs) display similar small stem loops but are an integral part of a complex genetic element. Other 
classes of repeats such as the REP2 element do not have TIRs but show other signatures. With the current availability of a large 
number of whole-genome sequences, many new repeat elements have been discovered. These sequences display diverse 
properties. Some show an intimate linkage to integrons, and at least one encodes a small RNA. Many repeats are found fused 
with chromosomal open reading frames, and some are located within protein coding sequences. Small repeat units appear to 
work hand in hand with the transcriptional and/or post-transcriptional apparatus of the cell. Functionally, they are multifaceted, 
and this can range from the control of gene expression, the facilitation of host/pathogen interactions, or stimulation of the 
mammalian immune system. The CRISPR complex displays dramatic functions such as an acquired immune system that defends 
against invading viruses and plasmids. Evolutionarily, mobile repeat elements may have influenced a cycle of active versus 
inactive genes in ancestral organisms, and some repeats are concentrated in regions of the chromosome where there is 
significant genomic plasticity. Changes in the abundance of genomic repeats during the evolution of an organism may have 
resulted in a benefit to the cell or posed a disadvantage, and some present day species may reflect a purification process. The 
diverse structure, eclectic functions, and evolutionary aspects of repeat elements are described. 
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Introduction 

Small DNA repeat sequences, less than approximately 400 bp, 
are present in genomes in a wide range of bacteria. These 
repeats are primarily in intergenic regions of the chromosome 
and are present in multiple copies, some as many as approx- 
imately 1 ,600 (Rocco et al. 201 0). Many repeat units fall into 
two broad categories, the miniature inverted-repeat transpos- 
able element (MITE) (Siguier et al. 2006; Delihas 2008) and 
the repetitive extragenic palindromic (REP) sequence (Stern 
et al. 1984; Bachellier et al. 1999). Other repeats such as 
the REP 2-5 units (Parkhill et al. 2000), YPLA/RU2 (De Gregor- 
io et al. 2006; Delihas 2007), and bcr elements (Kristoffersen 
et al. 201 1) appear to constitute separate classes or are sub- 
classes. The clustered regularly interspaced short palindromic 
repeats (CRISPRs) are in a category of their own in that they 
are found as an array with short spacer sequences and are 



associated with a complex family of protein genes. Most 
repeat sequences have the potential to fold into a stable sec- 
ondary structure at the DNA and/or RNA level, and many are 
transcribed into RNA where the RNA secondary structure may 
be a factor in regulating gene expression (Croucheretal. 201 1). 
Examples of predicted RNA secondary structures of repeat 
units are in figure 1 . Repeats display diverse roles in terms of 
bacterial cell physiology and cell-host interactions. They are 
found pintegron units (Gillings et al. 2009; Poirel et al. 
2009). REPsare implicated in stimulation of the mammalian im- 
mune system (Magnusson et al. 2007), and they can affect 
genomic plasticity by serving as sites for insertion of transpos- 
able elements (Tobesand Pareja 2006). CRISPR unitsf unction as 
an RNA-based mechanism of inhibition of invading DNA and 
represent a possible example of Lamarckian inheritance in 
prokaryotes (Koonin and Wolf 2009). 
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Fig. 1. — (a) Predicted secondary structures of repeat sequences at the RNA level. The Mfold program was used for RNA folding (Markham and 
Zuker 2005). The Neisseria meningitidis nemis (neisseria miniature ISs) is characteristic of MITEs, and the secondary structure shown is similar to that of 
Mazzone et al. (2001 ). The top schematic describes inverted repeats (IR) and DRs flanking the DNA strand. The bcrl structure is that of Bacillus anthracis 
1 R (0kstad et al. 2004) and is typical of the Bacillus bcrl RNA secondary structures (Klevan et al. 2007). These consist of a cruciform-like structure with 
two independent stem loops. The Stenotrophomonas maltophilia REP sequence and secondary structure shown is characteristic of the short high G-C 
content REPs found in these species; they are termed SMAG (Rocco et al. 2010). These SMAG units can carry an unpaired tetranucleotide sequence at 
one end. (b) Left, predicted RNA secondary structures of the REP2 sequence from N. meningitidis showing internal stem loops 1 and 2. The nt sequence 
is from Morelle et al. (2003). Upper schematic denotes the REP2 DNA strand with promoter, ribosome binding site (RBS), and ATG initiation codon. (b) 
Right, predicted secondary structural model of the Borrelia burgdoferi IR-A sequence from circular plasmid cp8.3/lp21 [nt sequence from Dunn et al. 
(1 994)]. Stem loops 1 and 2 may be analogous to those of REP2; however, Dunn et al. (1 994) show the two IR-A stem loops in DNA form. Top schematic 
depicts the DNA strand with promoter, RBS, and ATG sites on the IR-A segment. 
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Fig. 1. — Continued 



REP sequences were first discovered in Escherichia coli 
(Higgins et al. 1982; Gilson et al. 1984; Stern et al. 
1984), the MITEs in Neisseria (Correia et al. 1986, 
1988), and CRISPR palindromic repeats in E. coli (Ishino 
et al. 1987). Properties of several repeat elements have 
been reviewed in the past (Tobes and Pareja 2006; Brouns 
et al. 2008; Delihas 2008); however, with the advent of an 
array of whole-genome sequences and development of bi- 
oinformatics programs to identify these units (Chen et al. 
2009), increased numbers of repeat elements are being dis- 
covered (table 1) and comparative genomics between 
closely related bacterial species can be done. Such analysis 
has yielded important aspects of evolutionary change oc- 
curring in genomes that may be related to repeat sequen- 
ces, for example, the correlation between repeat element 
location and chromosomal plasticity (Mine et al. 2009; Silby 
et al. 2009; Ogier et al. 2010; Kristoffersen et al. 201 1). In 
other studies, a comparison of changes in repeats during evo- 
lution has led to the concept that a high abundance of mobile 
repeats in genomes can be parasitic and a potential disadvan- 
tage to an organism; some current species are found to carry 
fewer mobile repeats than their ancestors (Croucher et al. 
201 1). On the other hand, phylogenetic comparisons of Pe- 



lobacter carbinolicus and its ancestors, together with the re- 
sults from genetic experiments using transgenic strains of 
Geobacter, led Aklujkar and Lovley (2010) to propose that 
a CRISPR spacer sequence that contained a segment of the 
host gene hisS resulted in an evolutionary loss of ancestral 
genes that rely on the function of hisS. 

Many repeat sequences also display open reading frames 
that are found fused to chromosomal reading frames. These 
fusions are discussed in terms of a possible formation of new 
proteins or the alteration of existing proteins. Jacob (1977) 
proposed the concept of "tinkering" during evolution in 
terms of the combination of two motifs to produce a different 
and more elaborate structure. We review here the diverse 
molecular, functional, and evolutionary aspects of recently 
discovered repeat elements. 

MITE — A Repeat Element Found in 
a Broad Range of Bacteria 

MITEs are termed nonautonomous as they are incapable of 
self-transfer and require a transposase acting in trans for 
transposition. Although MITEs were first discovered in bac- 
teria (Correia et al. 1988), they were formalized as 
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Table 1 

Bacterial Intergenic Repeat Elements 



Repeat Element 


Organisms 


LldSS 


Size (bp) 


Reference 


Correia 


Neisseria sp. 


MITE 


-104 to 157 


Correia et al. (1988) 


RUP 


Streptococcus pneumoniae 


MITE 


107 


Oggioni and Claverys (1999) 


ERIC 


Enterobacteriaceae 


MITE 


-127 


Sharpies and Lloyd (1990) and 
Hulton et al. (1991) 


MaeMITE 


Microcystis aeruginosa 


MITE 


150-435 


Kaneko et al. (2007) 


Nezha 


Anabaena variabilis, Nostoc sp. 


MITE 


-130 to 170 


Zhou et al. (2008) 


MITE 


Anabaena sp. 


MITE 


-224 


Wolk et al. (2010) 


Chunjie 


Geobacter uraniireducens Rf4 


MITE 


178 to 235 


Chen et al. (2008) 


Muzha 


A. variabilis 


MITE 


-154 


Chen et al. (2009) 


Duanwu 


Haloquadratum walsbyi 


MITE 


257 


Chen et al. (2009) 


Qixi 


H. walsbyi 


MITE 


165 


Chen et al. (2009) 


Chongyang 


H. walsbyi 


MITE 


119 


Chen et al. (2009) 


MITE 


Anabaena sp. 


MITE 


127-204 


Fewer et al. (2011) 


BOX 


Str. pneumoniae 


MITE-like 


67-637 


Martin et al. (1992) 


R0 a 


Pseudomonas fluorescens 


MITE-like 


89 


Silby et al. (2009) 


R1 


Pse. fluorescens 


MITE-like 


80 


Silby et al. (2009) 


R2 


Pse. fluorescens 


MITE-like 


110 


Silby et al. (2009) 


R6 


Pse. fluorescens 


MITE-like 


177 


Silby et al. (2009) 


IMU 


Enterobacter cloacae CHE-2 


IMU (MITE) 


288 


Poirel et al. (2009) 


NFM2 MITE 


Acinetobacter sp. 


NFM2 (MITE) 


439 


Gillings et al. (2009) 


SPRITE 


Str. pneumoniae 


Rho-independent 
terminator-like 


-105 


Croucher et al. (2011) 


CIR 


Caulobacter + other sp. 


CIR 


-110 


Chen and Shapiro (2003) 


RPE 


Rickettsia sp. 


RPE 


-105 to 146 


Ogata et al. (2000) 


YAPL/RU-2 


Yersinia sp. 


YAPL/RU-2 


-168 


De Gregorio et al. (2006)and 
Delihas (2007) 


RU-3 


Escherichia coli, Shigella sp. 


RU-3 


103 


Delihas (2007) 


bcr1 b 


Bacillus cereus group 


bcr Group A 


-155 


0kstad et al. (2004) 


bcr5 c 


B. cereus group 


bcr Group B 


310 


Kristoffersen et al. (2011) 


REP 


Enterobacteriaceae 


REP 


-35 


Stern et al. (1984) and 
Gilson et al. (1984) 


REP 


Pse. putida d 


REP 


35 


Aranda-Olmedo et al. (2002) 


IR1_g 


Pse. fluorescens 


REP 


-25 


Silby et al. (2009) 


REP 


Stenotrophomonas sp. 


REP 


-35 


Nunvar et al. (2010) and 
Rocco et al. (2010) 


ATR 


Pse. fluorescens 


ATR 


183 


Silby et al. (2009) 


R1 78 


Pse. fluorescens 


R1 78 


101 


fML... ~ x _ 1 / r\r\r\\ 

Silby et al. (2009) 


REP2 


Neisseria meningitidis 


REP2 


-134 to 154 


Parkhill et al. (2000) 


REP3 


N. meningitidis 


REP3 


60 


Parkhill et al. (2000) 


REP4 


N. meningitidis 


REP4 


26 


Parkhill et al. (2000) 


REP5 


N. meningitidis 


REP5 


20 


Parkhill et al. (2000) 


RS (NIME) 


N. meningitidis 


RS (NIME) 


70-200 


Parkhill et al. (2000) 


CRISPR 


E. coli + other sp. 


CRISPR 


28-49 


Ishino et al. (1987) 


Borrelia IR 


Borrelia burgdorferi 


IR-A, IR-B 


-180 


Dunn et al. (1994) 


BRE 


Beta-proteobacteria 


BRE 


-90 


Hot et al. (2011) 


Stem loop left 


Borrelia sp. 


Stem loop left 


34 


Delihas (2009) 


Stem loop right 


Borrelia sp. 


Stem loop right 


32-51 


Delihas (2009) 



a Seven additional repeat elements without IR not shown. 
b Two additional similar repeats not shown. 
c Two additional similar repeats not shown. 

d See Tobes and Pareja (2006) for additional species with REP sequences. 



nonautonomous transposable sequences in plants (Bureau 
and Wessler 1992, 1994; Feschotte et al. 2002; Kikuchi 
et al. 2003). Experimentally, they have been transferred 
by transposases in vivo in both prokaryotes and eukaryotes 
(Poirel et al. 2009; Yang et al. 2009; Hancock et al. 2010). 



Bacterial MITEs either are or once were mobile. They are 
generally less than 200 bp, but some are as larger as approx- 
imately 400 bp. MITE sequences have signatures typical of 
many insertion sequences (ISs), that is, they contain terminal 
inverted repeats (TIRs) that straddle a core sequence and 
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Fig. 2. — Diagrammatic representation of empty and filled site in homologous chromosomal regions in Anabaena variabilis and Nostoc sp. (based 
on Zhou et al. 2008) The Nezha MITE insertion is shown in A. variabilis. Shown also diagrammatically are the DRs and TIR. Genes depicted as "a" and 
"b" are orthologs between the two species. In another chromosomal region (not shown), Nezha can be found inserted into a site in Nostoc sp., while 
the same site is empty in A. variabilis (Zhou et al. 2008). 



they are flanked by target site duplications (TDs), which con- 
sist of direct repeats (DRs). The core sequence of a MITE, 
however, lacks a transposase gene, although MITEs that 
carry open reading frames show amino acid sequences un- 
related to transposase sequences (Delihas 2007). Another 
classic feature of MITEs is that they can fold into long stem 
loop structures at the RNA level (fig. 1a), and some are 
highly stable thermodynamically (Chen et al. 2008). 

MITEs are multifaceted, for example, they can carry 
structure/function motifs, such as an integration host fac- 
tor (IHF) binding site (Buisine et al. 2002), a methyltransfer- 
ase binding site (Chen and Shapiro 2003), or promoter 
sequences (Black et al. 1995; Buisine et al. 2002; Snyder 
et al. 2003). Functionally, promoter strengths have been 
measured and RNA transcripts detected in transcriptional 
assays, but functional IHFs have not yet been observed 
(Siddique et al. 2011). Many repeats are found at 3' 
end regions of genes and shown to be co-transcribed. 
Some regulate messenger RNA (mRNA) stability (De Gre- 
gorio et al. 2002, 2006). For example, the presence of an 
enterobacterial repetitive intergenic consensus (ERIC) se- 
quence downstream of a gene may induce a conforma- 
tional change in RNA transcripts and create a cleavage 
site for RNase E. This then can activate degradation of up- 
stream mRNAs by 3' to 5' exoribonucleases (De Gregorio 
et al. 2005). There is an increased number of MITE and 
MITE-like units that are currently being discovered, and 
search programs such as MUST (Chen et al. 2009) can ac- 
celerate discovery of MITEs, for example, the newly found 
MITEs in cyanobacteria using the MUST program (Lin et al. 
2011). With the availability of genome sequences from 
closely related organisms, the recent transposition of 
MITEs in some organisms has been proposed based on bi- 
oinformatics analyses (Zhou et al. 2008; Snyder et al. 
2009). 

As MITEs appear to be prevalent in cyanobacteria, we out- 
line some recent findings. Kaneko et al. (2007) identified 



eight groups of putative MITE sequences in the cyanobacte- 
rium Microcystis aeruginosa. In a follow-up study, Lin et al. 
(201 1) analyzed 17 cyanobacterial genomes and found sev- 
eral thousand MITE sequences. Microcystis aeruginosa also 
has a high abundance of IS elements, and a linear correlation 
was found between IS and MITE abundance. One group of 
MITEs is believed to be formed by a deletion within an IS el- 
ement. 

In other cyanobacteria, Anabaena variabilis and Nostoc 
sp., MITE sequences termed Nezha (approximately 
130-170 bp) were characterized (Zhou et al. 2008). Nezha 
has signatures characteristic of MITEs, that is, TIRs that are 
similar in sequence to the TIRs of an intact transposon, DRs 
flanking the element, and predicted secondary structures 
that are highly stable thermodynamically. Nezha is predicted 
to be recently mobile based on analysis of empty and filled 
target sites in homologous chromosomal regions from closely 
related species (fig. 2). High percent identities and low E val- 
ues show that adjacent genes in the empty and filled chro- 
mosomal sites are orthologous. Nezha shares the same TIR 
and nearly the same DR sequences as the IS \SNpu3. However, 
\SNpu3 is only found in another species, Nostoc punctiforme. 
It is hypothesized that a similar IS transposase moved Nezha in 
Nostoc sp and A. variabilis. In a different study with Anabaena 
sp., five closely related MITE sequences have been detected 
(Wolk et al. 201 0). As described for DNA repeat sequences in 
some Enterobacteriaceae species (Delihas 2007), several open 
reading frame fusions are found between open reading 
frames of Anabaena MITEs and chromosomal open reading 
frames (Wolk et al. 2010). 

MITEs have also recently been characterized in addi- 
tional bacteria. A repeat sequence called "Chunjie" also 
displays the classic signatures of a MITE. It was detected 
in Geobacter uraniireducens Rf4, a member of the delta- 
proteobacteria (Chen et al. 2008). The Chunjie sequences 
are 178-235 bp, contain 21 bp TIRs at each end, are A+T 
rich, and the terminal ends are flanked by 9 bp DRs. These 
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Integron 

Fig. 3. — Diagrammatic representation of the defective integron flanked by identical IMU elements as found on Enterobacter cloacae plasmid pCHE-A 
(based on Poirel et al. 2009). The arrows represent the IMU inverted repeats (IR). Shown also are is the defective intl gene at the 5' side (left), b/aGES-5, 
the beta-lactamase gene cassette in the middle, and the defective quaternary ammonium salt gene qacE on the 3' side. Lengths are not drawn to scale. 



sequences can fold into highly stable secondary structures 
at the RNA level, for example, they show a delta G of ap- 
proximately -98 to approximately 130 kcal/mol. Several 
Chunjie repeat sequences were found to overlap protein 
genes. 

MITE-like sequences termed R0, R1, R2, R6, and R178 
(table 1) were detected in Pseudomonas fluorescens 
(Silby et al. 2009). These range from 80 to 1 77 bp. Most have 
TIRs and can fold into stem loop structures. The inverted re- 
peats of two MITE-like sequences R0 and R2 are identical to 
the inverted repeats found at the ends of IS elements present 
the same organism; thus, it is possible that the MITE-like se- 
quences can be mobilized by these IS elements. Pseudomo- 
nas fluorescens also has regions devoid of repeats, which 
represents 40% of the genome. These regions are called "re- 
peat deserts," which mostly have essential genes. There may 
have been an evolutionary selection process whereby cells 
that developed repeat sequence insertions in housekeeping 
genes could not survive. 

Sequences comparable to the 1 27 bp ERIC MITE found in 
E. co//and related organisms (Hulton et al. 1991 ; Wilson and 
Sharp 2006) are particularly abundant in the chromosome 
of Photorhabdus luminescens (Duchaud et al. 2003). These 
MITEs are also found in Xenorhabdus (Ogier et al. 2010). 
Both Photorhabdus and Xenorhabdus belong to the Enter- 
obacteriaceae family but are insect pathogens. These ERIC- 
type sequences have TIRs, TDs, and a 5 TA 3 ' motif flanking 
both termini. 

Snyder et al. (2009) provide evidence for the mobility 
of a Correia repeat (termed CREE) in Neisseria gonor- 
rhoeae based on comparisons of chromosomal differences 
in locations of CREEs in two closely related strains of 
N. gonorrhoeae. The repeats are found in prophage re- 
gions in one strain and not in another, which indicates a re- 
cent transfer. In addition, many CREEs are found on the 
5' side of genes. Thus, the CREEs may influence gene 
expression at the transcriptional level. The same conclu- 
sion was reached by Siddique et al. (201 1 ), who measured 
promoter strengths of the Correia repeat element in 
N. meningitidis. 



Integron Mobilization 

A special MITE termed integron mobilization unit (IMU) was 
detected in plasmid DNA of Enterobacter cloacae (Poirel 
et al. 2009). It encompasses a novel structure whereby 
two identical IMU sequences flank an intervening sequence 
that carries a defective class 1 integrase, a defective qacE 
gene and a beta-lactamase gene Ob/a ges _ 5 ) that confers re- 
sistance to the antibiotic carbapenem (fig. 3). The integrase 
and qacE genes are features of class 1 integrons. The IMU 
sequence is 288 bp and contains TIRs. The spacer sequence 
is devoid of transposase sequences and displays no known 
motifs, but the IMU can fold into a predicted thermodynam- 
ically stable secondary structure at the RNA level. Impor- 
tantly, transposition experiments show that the IMU- 
integron complex can be transposed in vivo to another plas- 
mid by transposase acting in trans (Poirel et al. 2009). A five 
bp target site duplication (TD) is present at termini of the 
transposed IMU integron. The IMU TIR sequence is almost 
identical to the inverted repeat sequence of \SSod9 from 
Shewanella oneidensis MR-1 (Poirel et al. 2009). The high 
similarity may be associated with recognition of the IMU 
by the transposase. The IMU represents the first MITE-like 
sequence found in plasmid DNA, and significantly, the first 
shown to be a nonautonomous transposable element using 
in vivo assays in prokaryotes (Poirel et al. 2009). The Ent. 
cloacae plasmid pCHE-A, which contains the IMU- 
integrase-antibiotic resistance gene complex, is nonself 
conjugative, thus the interesting question arises as to 
whether the IMU-containing integron can spread antibiotic 
resistance to other species via transposition. 

A sequence similar to the Ent. cloacae IMU is found in 
a plasmid of another bacterial species but independent of 
an associated integron sequence. This IMU homolog is in 
Aeromonas salmonicida subsp. salmonicida A449 plasmid 
5 (as determined by a basic logic alignment search tool [Blast] 
search, Expect = 4e~ 30 , Identity = 77%, found at nt 
positions 151346-151633, Accession number CP000646; 
N.D., unpublished). This putative IMU is in the intergenic re- 
gion between locus ASA_P5G161, which encodes a trun- 
cated cobyrinic acid a,c-diamide synthase and locus 
ASA_P5G162, representing a hypothetical protein. Thus, 
the IMU may have a broader presence in genomes. 
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A defective Tn402-like integron is present in Acineto- 
bacter sp. str nfm2 (Gillings et al. 2009). This integron is 
flanked by identical copies of a 439 bp DR sequence, which 
appears to be MITE-like and is termed NFM2-MITE. The in- 
tegron contains deletions at the 5' and 3' ends, which may 
have occurred when MITE sequences were fused with the 
integron. The outer ends of the MITE are flanked by a 5 
bp DR. This MITE-like sequence is A+T rich, has TIRs, and 
has the potential to form a highly stable secondary structure. 
It may represent a special class of MITEs (Gillings et al. 2009); 
however, it has not been found to be transferable by trans- 
posase. On the other hand, experiments using polymerase 
chain reaction primers suggest that excision via homologous 
recombination is possible. Further analyses are needed to 
further define this interesting MITE-like-integron-associated 
element. Although the defective Tn402-MITE carries no an- 
tibiotic resistance genes, the Tn402-like integron is known 
to contribute to the proliferation of multi-antibiotic resistant 
genes (Gillings et al. 2009). 

Repeat Sequences and IMoncoding 
RNAs 

Chinni et al. (201 0) detected a small RNA transcript by northern 
blots from an intergenic sequence of Salmonella typhi that con- 
tains a heretofore uncharacterized repeat sequence of approx- 
imately 200 bp, a repeat that may be MITE-like. This repeat 
sequence and its overlapping RNA sequence map in a chromo- 
somal region of 5. typhi that represents a pathogenicity island. 
The RNA is growth regulated and appears during mid- to late- 
log phase. Sequences similar to the intergenic region in 5. typhi 
are found in E. coli, but the RNA transcript has not been de- 
tected. Further detection of possible RNA transcripts in other 
5. typhi strains and nucleotide sequence comparisons of the re- 
peat intergenic region in 5. fyp/?/ strains and in E. coli may shed 
light on possible origins of the RNA and nature of the repeat 
sequence. For example, a comparison of sequences may show 
changes in the repeat sequence that formed a promoter for the 
putative RNA gene locus in 5. typhi. A search for sequence 
changes that show upstream regulatory elements would also 
be useful as expression of the RNA is growth regulated. 

Small RNA transcripts originating from intergenic chro- 
mosomal regions were detected in N. meningitidis. These 
transcripts are generated by an adjacent Correia element 
promoter (Siddique et al. 201 1 ). In this case, the nt sequence 
downstream of the MITE Correia promoter is transcribed 
and not the Correia sequence. 

Diverse Repeats in Streptococcus 
pneumoniae 

Three repeat units, a tandem array of repeat sequences termed 
BOX, Repeat Unit of Pneumococcus(RUP), and Streptococcus 
pneumoniae Rho-lndependent Terminator-like Element 



(SPRITE) were identified in Streptococcus sp. (Martin et al. 
1992; Oggioni and Claverys 1999; Croucheret al. 201 1). Pre- 
dicted secondary structures of these repeats suggest possible 
roles at the RNA level (Croucheret al. 201 1). For example, the 
SPRITE structure shows a motif similar to a Rho-independent 
termination motif, and its location in the genome has a bias 
in regions close to the 3 ' ends of convergent genes. One iden- 
tified BOX element has two T box riboswitch motifs, whereas 
another BOX element has open reading frames. Riboswitches 
can control gene expression through mRNA binding of a small 
target molecule and subsequent change in RNA conformation 
(TuckerandBreaker2005). BOXisanonautonomoustranspos- 
able unit thought to be mobilized by ISSteo 7 (Knutsen et al. 
2006). In addition, the BOX elements have been shown to 
be transcribed in Str pneumoniae (Croucher et al. 2011). 
RUP has classic MITE properties with TDs and TIRs that were 
described before (Oggioni and Claverys 1999). 

Comparison of the abundance of the three repeats be- 
tween Str. pneumoniae clinical isolates and closely related 
species indicates that there was a past burst of repeat ele- 
ment movement in the genome of ancestors, but now, they 
appear dormant and their abundance is diminishing 
(Croucher et al. 2011). When inserted into intergenic re- 
gions, these repeats can function in gene regulation and 
can potentially be of benefit to the cell, but they are also 
found inserted into coding regions of a number of protein 
genes. Disruption of these genes can compromise the cell. 
From the evolutionary analysis of repeats, the authors con- 
clude that streptococcal repeats are largely parasitic and 
may compromise the cell's ability to compete in its environ- 
ment; thus, surviving species have fewer mobile elements. 

Specialized Repeats in Bacillus sp. 

Repeatsequenceshave been identified inthegram-positive£a- 
cillussp. (Tourasse etal. 2006), and 1 8 have been characterized 
(Kristoffersenetal. 201 1 ). These fall into three groups whereby 
Group A sequences have properties of a nonautonomous 
transposableelements(Kristoffersenetal. 201 1 ). This includes 
the repeat unittermed bcrl (approximately 1 55bp), which was 
extensively analyzed previously (Klevan et al. 2007). Compar- 
isons between related Bacillus strains showa nonconserved ge- 
nomic distribution with the repeat sequence flanked by 5 bp 
DRs in each case. This repeat element is transcribed. Base-pair 
compensatory changes are found to maintain a cruciform-like 
double-stranded structure at the RNA level. Figure 1a shows 
the bcrl -predicted secondary structure. However, a compari- 
son oibcrl secondary structures between closely related Bacil- 
lus strains indicates that secondary structures vary in stability, 
and the authors suggest that bcrl repeats lost structural stabil- 
ity several times during evolution. The bcrl sequence may rep- 
resent a special class of mobile sequences. 

bcr5 is part of Group B repeats and is found associated with 
a gene cluster that contains a resolvase gene, as well as 
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a transposase gene and a hypothetical protein gene (Kristof- 
fersen et al. 2011). bcr5 elements flank both ends of the 
resolvase-containing gene cluster. Although the focr5-associ- 
ated gene cluster does not encode an integrase, the cluster 
arrangement shows broad similarities to the integron clusters 
described above. bcr5 does not have inverted repeats but has 
a predicted stable secondary structure. It has not been classi- 
fied, but it does not appear to be MITE-like. 

Group C elements are conserved phylogenetically in ge- 
nomic locations. Some sequences may represent RNA tran- 
scripts and riboswitches as well. This work further extends 
the repeat element repertoire in the gram-positive bacteria. 

Mycobacterial Interspersed Repetitive 
Units: Possible MITE-Like Sequences 

A class of repeat sequences termed mycobacterial inter- 
spersed repetitive units (MIRUs) was found in several species 
of Mycobacterium (Supply et al. 1997). 

The size ranges from approximately 40 to 100 bp, and 
these elements are found repeated approximately 40-50 
times in the Mycobacterium genome. One of the sites con- 
taining the repeat sequence is found within an intergenic 
chromosomal region of Mycobacterium tuberculosis, be- 
tween two conserved open reading frames that represent 
a conserved hypothetical protein and a serine/threonine 
phosphatase. The MIRU is transcribed as a polycistronic 
mRNA. Homologous regions in Myc. leprae do not contain 
the MIRU repeat sequence. MIRUs display some similarities 
to MITES. Comparison of empty and filled sites and the 
presence of tetranucleotide DRs on the 5' and 3' sides 
of filled site in Myc. tuberculosis suggest insertion by trans- 
position. Although not stated as such, the MIRUs display 
imperfect TIRsand have internal inverted repeats that are rich 
in G-C bonds. MIRUs display open reading frames, and the 
terminal ends of the MIRU sequence overlap the adjacent 
genes in the polycistron mentioned above (Supply et al. 
1997). Important to clinical diagnostics and epidemiological 
analyses, the mycobacterial interspersed repetitive sequences 
are currently used for Myc. tuberculosis genotyping for fast 
identification of clinical isolates (Supply et al. 2006). 

REP Sequences — Multifunctional 
Elements 

REP sequences are approximately 35 bp but range between 
2 1 and 65 bp (Tobes and Pareja 2006). These are some of the 
smallest repeat sequences known. They were first found in 
Enterobacteriaceae species and later detected in Pseudomo- 
nas and Stenotrophomonas (Aranda-Olmedo et al. 2002; 
Silby et al. 2009; Nunvar et al. 201 0). REP sequences are of- 
ten found in high abundance with several hundred copies 
present in genomes either as single units or in clusters called 
bacterial interspersed mosaic elements (BIME) (Bachellier 



et al. 1994). They tend to be G+C rich and can fold into 
perfect or imperfect stem loops. In Pse. syringae, a bias 
for the positioning of the REP elements between convergent 
genes was found (Tobes and Pareja 2005). In E. coli, BIME 
clusters containing REP units have been associated with re- 
combination. They can also affect mRNA stability (Stern 
et al. 1988). BIMEs form binding sites for IHF (Oppenheim 
et al. 1 993), DNA polymerase I (Gilson et al. 1 984), and DNA 
gyrase (Yang and Ames 1988), and it has been shown that 
DNA gyrase can cleave DNA in vivo in BIME regions (Espeli 
and Boccard 1 997). Thus, REP units are intimately involved in 
molecular processes in the cell. 

Although REP sequences do not display MITE signatures, 
Nunvar et al. (201 0) hypothesize that REPs found in Stenotro- 
phomonas sp. may be mobilized by transposase. The trans- 
posase gene termed REP-associated tyrosine transposase 
(RATY) was detected in Stenotrophomonas sp. by in silico 
methods (Nunvar et al. 2010). RATYS are related to the 
IS200/IS605 family of transposases in terms of conserved 
amino acid motifs; however, they differ in that RATYs lack 
flanking stem loop sequences found in IS200/IS605(Ronning 
et al. 2005). Instead, several RATYs are flanked by inverted 
REP sequences, that is, 5' to 3' configuration of the REP se- 
quence on the side of the transposase gene encoding the 
amino terminal end and a 3' to 5' REP configuration on 
the side encoding the carboxyl terminal. Because of the close 
association and conserved configuration between REPs and 
RATYs, this brings up the question of how Stenotrophomo- 
nas sp. REPs are mobilized. The authors hypothesize that 
RAYTs may be responsible for the proliferation of REP units, 
and thus, REPs may be transposable. Previously, Siguier et al. 
(2006) also suggested that REPs may be nonautonomous 
transposable elements. 

Additional analyses in Stenotrophomonas maltophilia 
show that some REP sequences uniformly have a GTAG tet- 
ranucleotide sequence preceding the palindromic REP on its 
5' side (Rocco et al. 2010). These elements are termed Ste. 
maltophilia GTAG (SMAG). The SMAG REP sequences ap- 
pear to alter the stability of upstream gene transcripts in 
Ste. maltophilia. The presence of one or two units of SMAG 
downstream of a gene has a stabilizing effect on the gene 
transcript yet a trimer SMAG appears to have a destabilizing 
effect. Thus, the SMAG sequences regulate gene expression 
at the post-transcriptional level in Ste. maltophilia but in 
a complex manner. 

REP sequences have been implicated in the interaction 
with the host immune system. Synthetic oligodeoxynucleoti- 
des that mimic gram-negative bacterial REP unit sequences 
and their secondary structures were shown to stimulate 
the mammalian immune system via Toll-like receptor 9. This 
appears to be based on the CpG motif of REP sequences 
(Magnusson et al. 2007). It was hypothesized that REPs 
may also be involved in induction of human septic shock 
by pathogenic bacteria carrying REP sequences. 
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REP2 Repeats — Involvement in 
a Virulence Process 

Intergenic repeat sequences that share no homologies with 
other known repeat sequences are found in Neisseria sp. 
(Parkhill et al. 2000). One is termed REP2, which ranges 
in size from 1 20 to 1 50 bp. REP2 is found repeated 26 times 
in intergenic regions of N. meningitidis MC58 and 23 times 
N. gonorrhoeae FA1090. These repeats do not have TIRs and 
have no relationship with REP sequences. However, they 
have two internal inverted repeats that form predicted in- 
ternal stem loops at the RNA level (fig. ^b). REP2 sequences 
appear to represent a unique class of repeats in that they 
contain a promoter sequence, a ribosome binding site, 
and an ATG initiation codon. They are often present up- 
stream of open reading frames. In N. meningitidis Z2491 , 
REP2 repeats are found immediately upstream of 14 genes 
that are coordinately upregulated during initial cell-to-cell 
contact with human cells (Morelle et al. 2003). Two of these 
encode the pilC1 and crgA genes. PilC1 is an adhesin that 
mediates attachment of N. meningitidis to host cells. CrgA is 
a transcriptional regulator termed contact-regulated gene A 
protein (Deghmane et al. 2000). Both pilC1 and crgA are 
induced with initial host cell contact, and both have REP2 
sequences in their upstream regions. Thus, REP2 sequences 
participate in control of expression of genes essential for the 
interaction of N. meningitidis with human host cells (Morelle 
et al. 2003). 

REP2 repeat sequences not only represent fusions of their 
translational start site with open reading frames but they 
also appear to contain mRNA 5' UTR sequences. This fasci- 
nating repeat poses interesting questions concerning its or- 
igin and mechanism of proliferation in the neisserial 
genome. Did it originally arise from an upstream regulatory 
site and the 5' UTR sequence of a protein gene? 

Borrelia Sequences — Similarities to 
Neisserial REP2 

Repeat elements in Borrelia chromosomes have not been re- 
ported. These chromosomes are small, for example, the Bor- 
relia burgdorferi chromosome is approximately 0.9 Mb. There 
is tight packing of housekeeping and other genes and a pau- 
city of intergenic space. Thus, there may be selective pressure 
to limit establishment of repeat elements in the Borrelia chro- 
mosome. However, repeats are present in Borrelia plasmid in- 
tergenic regions, albeit in a small copy number (Casjens et al. 
2000). Sequence elements termed IR-A and IR-B that contain 
internal inverted repeats were found in both circular and lin- 
ear plasmids (Dunn et al. 1994; Zuckert and Meyer 1996). 
These sequences have motifs strikingly similar to the REP2 re- 
peat found in N. meningitidis (Parkhill et al. 2000; Morelle 
et al. 2003), that is, both the Borrelia IR sequences 
and the Neisseria REP2 sequence are located immediately 



upstream of genes and contain a promoter sequence, ribo- 
some binding site, and an ATG start codon. Both sequences 
also have two internal inverted repeats close to their 5' ends 
that form predicted internal stem loops 1 and 2 (fig. ^b). 
Dunn et al. (1994) originally showed the Borrelia stem loops 
at the DNA level. 

Other intergenic sequences in Borrelia plasmids have in- 
verted repeats identical in stem loop structure to the inverted 
repeats that flank termini of an IS related to IS200/IS605 
(Delihas 2009). The Borrelia IS 5' and 3' end flanking inverted 
repeats form stem loops; however, each has its own second- 
ary structure signature. Significantly, these stem loop sequen- 
ces are found associated with the 3' ends of two types of 
putative lipoprotein genes and independent of transposase 
gene sequences. In one case (involving the IS 5' end specific 
stem loop motif), the secondary structure is phylogenetically 
conserved at the RNA level with base-pair compensatory 
changes. In the other case, the IS stem loop motif associated 
with lipoprotein-1 genes is not conserved and the secondary 
structure appears to have undergone rapid evolutionary 
change between Borrelia burgdorferi strains. Borrelia plas- 
mids contain many fragmented transposase gene sequences 
(Fraser et al. 1 997). The IS200/IS605 inverted repeat flanking 
sequences may be selectively conserved during decay of the 
IS element and based on findings of their evolutionary con- 
servation or evolutionary development may form functional 
units when located near 3' ends of genes. 

CRISPRs — Short Palindromic Repeats 
Are Focal Points in a Specialized 
Regulatory System 

CRISPRs differ from most other repeats described here in that 
these small sequences are part of a complex genetic arrange- 
ment. This consists of an array of palindromic DRs of approx- 
imately 28-49 bp. Linked with each repeat are variable spacer 
sequences that are fragments of foreign DNA (phage or plas- 
mid DNA), or in some cases, host DNA. An array of protein 
genes termed CRISPR-associated (cas) genes are also closely 
associated with the palindromic repeat/spacer units. CRISPRs 
function as regulatory complexes. Recently, there has been 
great interest in the genetic and molecular characteristics 
of CRISPRs and for several reasons. First, the CRISPR system 
can function as a bacterial and archeal immune system, 
whereby CRISPR defends the organism from invading viral 
or plasmid DNA (Al-Attar et al. 201 1). In addition, the mech- 
anism of action of CRISPR systems has similarities to eukary- 
otic piwi-interacting RNAs (piRNA) mechanism of RNA-based 
immune system that inhibits mobile elements in germ line 
cells (Karginov and Hannon 201 0; Marraffini and Sontheimer 
2010a). Lastly, this genetic element offers an example of 
a type of Lamarckian inheritance in prokaryotes (Koonin 
and Wolf 2009). The CRISPR DNA complex was first found 
in E. coli (Ishino et al. 1987), although much of its 
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characterization and functions have only been elucidated re- 
cently, approximately during the past 10 years (Barrangou 
et al. 2007). 

Here, we provide a short description of the molecular/ge- 
netics aspects of CRISPR functions as they relate to immunity 
to invading or self-DNA. It is beyond the scope of this paper to 
describe the CRISPR complex in detail. There are numerous 
published reviews. We point out two recent reviews (Al-Attar 
et al. 201 1 ; Terns MP and Terns RM 201 1) and a perspectives 
paper (Makarova et al. 201 1). These papers describe the his- 
tory, evolution, and known mechanisms of action of the 
CRISPR-based defense system against virus or plasmid inva- 
sions of bacterial and archael cells. 

There are basically three stages in the molecular and ge- 
netic processes of CRISPR function. During the acquisition 
stage, CRISPRs can capture fragments of foreign DNA from 
virus or plasmid sequences when challenged with the foreign 
DNA. A short segment (approximately 25-70 bp) of the for- 
eign DNA, called a proto-spacer is inserted into the CRISPR 
locus of the host DNA between two palindromic repeat se- 
quences. How the cell recognizes the short foreign DNA is 
unclear, but inserted foreign DNAs that have a small sequence 
(approximately a few nucleotides) adjacent to the spacer may 
be a recognition site (Mojica et al. 2009; Makarova et al. 
2011). This small sequence is termed a proto-spacer-adjacent 
motif sequence (Mojica et al. 2009). Two Cas proteins may be 
involved in the acquisition process. Additional spacers are 
then added to form an array of spacer-palindromic sequence 
repeats. It is not known if the palindromic repeat sequences 
serve as Cas protein recognition sites for integration of DNA 
fragments into the CRISPR complex (Nam et al. 201 1). 

In the second stage, the CRISPR complex is transcribed and 
cas genes are transcribed and translated. In E. coli, the large 
precursor CRISPR transcript is processed by a ribonucleopro- 
tiein complex termed Cascade (CRISPR-associated complex 
for antiviral defense). A Cas-specific endonuclease processes 
the RNA via cleavage at the base of the repeat stem loop se- 
quence, and with additional trimming, the mature RNA is 
formed (Brouns et al. 2008; Gesner et al. 201 1; Jore et al. 
2011; Sashital et al. 2011). After processing, the Cascade 
complex retains RNA transcripts of foreign spacer DNA 
and stem loop repeat sequences and bound Cas proteins. 

In the third stage, Cascade binds one strand of the tar- 
get DNA via complementary base-pairing between spacer 
RNA and target DNA to form an RNA/DNA heteroduplex 
duplex. The target DNA strand is subsequently cleaved 
(Jore et al. 201 1). Cas3 protein, which has endonuclease 
properties may be the major protein associated with target 
DNA inactivation in E. coli. (Brouns et al. 2008). 

This molecular process that results in defense against 
invading DNA was shown to be present in organisms that 
include Streptococcus, Staphylococcus, and E. coli species. 
However, CRISPRs can display different roles in different 



microorganisms, and spacer DNA may consist of a fragment 
of a host protein gene. 

In a clinical strain of Pse. aeruginosa lysogenized with the 
temperate phage DMS3, a CRISPR unit was found to be re- 
quired for inhibition of biofilm formation and swarming mo- 
tility (Zegans et al. 2009). One of the spacers of this unit, 
termed spacer 1 was found to be the determinant in inhi- 
bition (Cady and O'Toole 201 1). However, spacer 1 has par- 
tial identity (approximately 84%) to phage gene dms-42. 
Thus, the correlation between this spacer sequence and in- 
hibition of biofilm formation is puzzling, but spacer 1 was 
found to interact with the phage DMS-42 gene. Another 
spacer in this CRISPR unit, spacer 2, was shown to carry 
a segment of temperate phage DMS3 DNA with 100% 
identity, but this does not appear to result in defense against 
the phage. Of interest is that the lysogenized Pse. aerugino- 
sa strain that is unable to form a biofilm is a clinical isolate, as 
biofilm formation by Pse. aeruginosa is thought to be an im- 
portant factor in establishment of chronic lung infections by 
Pse. aeruginosa (Palmer and Whiteley 201 1). 

Aklujkar and Lovley (2010) show that the capture of 
a fragment (proto-spacer) of the host gene hisS by a CRISPR 
complex results in inhibition of expression of host hisS, the 
histidyl-tRNA synthetase gene. Furthermore, they propose 
that during evolution, inhibition of expression of hisS by 
the CRISPR complex resulted in loss of ancestral genes that 
encode proteins containing a high percentage of histidines 
or have closely spaced histidines in their peptide chains. An- 
cestral genes that rely on histidyl-tRNA synthetase activity 
and were lost include those that express the subunit of 
an NADH dehydrogenase I complex and multiheme c-type 
cytochromes. Approximately 1 6 genes were lost during evo- 
lution of R carbinolicus. It is believed that this organism sur- 
vived because it retained another NADH dehydrogenase I 
complex, whereby a component protein does not have 
a cluster of histidines, and perhaps by relying on fermenta- 
tion genes as well. 

This is a rather far-reaching finding. The inhibition of 
a "self" gene activity by the CRISPR complex can be consid- 
ered an autoimmune process in bacteria. This concept has 
been mentioned before (Marraffini and Sontheimer 201 0b; 
Stern et al. 2010), but now has been shown experimentally 
by Aklujkar and Lovley (2010). 

Repeats as Possible Engines for 
Genome Change 

A comparison of genomic sequences from related species 
shows that repeats may be associated with high levels of in- 
tragenic recombination (Silby et al. 2009; Ogier et al. 2010; 
Kristoffersen et al. 201 1). There is a striking lack of synteny 
between three closely related strains of Pse. fluorescens (Silby 
et al. 2009), and these strains vary greatly in repeat sequence 
abundance. For example, repeat elements R0 and R2 are 
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highly represented in one strain (SBW25) but are absent or 
found in low abundance in others (strains PfO-1 or Pf-5). 
In Xenorhabdus, ERIC-like sequences are in a chromosomal 
region of plasticity (termed Locus D). This region contains 
two ERIC (MITE) sequences, two transposase genes, and 
three truncated or disrupted genes. It was hypothesized that 
the ERIC sequence and the transposase genes play a role in 
plasticity of this chromosomal region (Ogier et al. 2010). In 
the Bacillus cereus group, repeat sequences bcr4-bcr6 and 
their unique locations with respect to neighbor genes may 
be associated with genomic rearrangements (Kristoffersen 
et al. 2011). 

In E. coli, the intergenic region between metabolic genes 
folA and apati is highly variable (Mine et al. 2009). This is of 
special interest in that the toxin-antitoxin system encoded by 
the ccd0157 gene complex is found between folA and apati 
in£ co// 01 57: H7 EDL933. SomeE coli strains carry defective 
ccd0157 genes or lack these genes completely in this region. 
Although the reason for this extensive instability is not 
known, an analysis of several hundred E coli strains shows 
that about 50% of the isolates contain a REP sequence in this 
region. REPs may in part account for the extreme plasticity of 
this region. Thus, evidence is accumulating to suggest that 
MITEs, REPs, and other repeats may play a role in genome 
dynamics during evolution in diverse species. 

MITEs appear to play a role in evolution of individual genes. 
MITEs have been found inserted into gene loci of microcystin 
genes (mcy) in the cyanobacteria Anabaena isolated from the 
Baltic Sea, with the subsequent inactivation of these genes 
(Fewer et al. 2011). Microcystins are toxins that inhibit eu- 
karyotic phosphatases (MacKintosh et al. 1990). MITE inser- 
tion into the mcy gene may provide a biological diversity in 
the population of the cyanobacteria. The mcy genes are con- 
sidered ancient genes. The ability to synthesize microcystins 
has been repeatedly lost during evolution (Rantala et al. 
2004). The Anabaena MITE may have been involved in this 
evolutionary process (Fewer et al. 201 1). 

IS elements recognize REP sequences as target sites for 
insertion. IS 1397 transposes specifically into REPs in E.coli, 
S.enterica serovar Typhimurium, and Klebsiella sp. (Wilde 
et al. 2001). IS627 found in E. coli recognizes a 15-bp se- 
quence in REP units and inserts into the REP sequences at 
its 3' side but outside of the inverted repeat sequences. This 
type of insertion is found in 1 0 chromosomal loci (Choi et al. 
2003). In addition, \S1594, which is present in Anabaena 
also inserts into REP-like sequences found in the Anabaena 
chromosome. Both \S621 and IS/594 belong to the S110/ 
\S492 family (Choi et al. 2003). Bioinformatics analyses 
show that REP sequences are targets for insertion of IS el- 
ements in Pseudomonas, Neisseria, and Sinorhizobium spe- 
cies (Tobes and Pareja 2006). Thus, the phenomenon of REPs 
serving as IS target sites for insertion is widespread and 
shows that REPs can affect plasticity. 



Repeat Element Open Reading 
Frames, Insertion into Protein Genes 

In addition to their prominent location in intergenic regions, 
many repeat sequences display open reading frames that 
are found fused in-frame with genomic open reading 
frames (Ogata et al. 2000; Delihas 2007; Croucher et al. 
2011; Hot et al. 2011; Fewer et al. 2011). Some repeats 
are found fused internally into protein coding sequences 
(Ogata et al. 2000; Croucher et al. 2011; Hot et al. 
201 1). Others extend the 3'-terminal ends of protein genes 
(Delihas 2007; Croucher et al. 201 1 ; Hot et al. 201 1 ) or the 
5' ends (Croucher et al. 2011; Hot et al. 2011). An RUP in- 
sertion disrupts the coding sequence of the gene encoding 
a putative iron ABC transporter binding protein (Croucher 
et al. 2011). A repeat termed Betaproteobacterial repeat 
element (BRE) is present in Bordetella and other beta- 
proteobacteria (Hot et al. 201 1). Rather striking is the large 
number of protein genes (approximately 9 genes) that 
contain BRE inserts internally. 

The possibility that repeat element fusions may create 
new proteins has been mentioned (Delihas 2008; Croucher 
et al. 201 1). Of major interest is that a BOX element that 
potentially encodes a 42-amino acid predicted protein 
was found to be transcribed (Croucher et al. 201 1). The de- 
tection of a translated protein product would show for the 
first time that a novel protein is formed by a repeat element. 

Some repeats form fusions with sequences specifying 
protein domains such as the left-handed parallel beta helix, 
and others display motifs such as predicted transmembrane 
helices (Delihas 2007). Many of these fusions are annotated 
as hypothetical protein genes. It is not known if they are evo- 
lutionary stable or transient, but some may serve as evolu- 
tionary reservoirs for new gene development (Treangen 
et al. 2009). 

The annotation of genes whereby repeat sequences are 
shown to be part of an open reading frame can help define 
genetic loci better and/or raise questions concerning the lo- 
cus. Several gene loci that contain repeat sequences have 
been annotated (Parkhill et al. 2000, 2001; Wei et al. 
2003). But when these repeats are missed, this may raise 
questions concerning the locus. For example, locus 
NMB0202 (Accession number NC_003112, coordinates 
204159-204332) is annotated as a hypothetical 57-amino 
acid protein in N. meningitidis MC58. This sequence and 
three identical annotated sequences in related N. meningi- 
tidis strains contain a hypothetical translated 47-amino acid 
REP2 sequence; thus, the REP2 sequence represents approx- 
imately 82% of the open reading frame. This poses the 
question of whether this hypothetical gene locus is essen- 
tially an intergenic region that has a fusion of the REP2 open 
reading frame with a small adjacent open reading frame. 
REP2 sequences, in addition to having signatures at the 
DNA level, also display translated open reading frames. 
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Conclusions and Future Prospects 

Small intergenic repeat sequences play an intricate role in 
molecular and functional aspects of the bacterial cell. Their 
individual signatures display a range of structure/function 
motifs, for example, MITE-like sequences straddle integrons 
in Ent. cloacae and Acinetobacter (Gillingsetal. 2009; Poirel 
et al. 2009), the REP2 in Neisseria and the Borrelia IR sequen- 
ces contain promoter sequences, a ribosome binding site 
and an ATG initiation codon followed by open reading 
frames (Dunn et al. 1994; Morelle et al. 2003), the Correia 
element in Neisseria and the REP cluster units (BIMEs) in E. 
coli carry an IHF binding site (Oppenheim etal. 1993; Buisine 
et al. 2002), and the Correia units also carry functional pro- 
moters (Siddique et al. 201 1). Different repeats show a ver- 
satility in function such as regulation of expression of genes 
essential for interaction with human host cells (Morelle et al. 
2003), serving as a recognition and cleavage site during RNA 
processing of the CRISPR transcript (Gesneret al. 201 1), and 
serving as target sites for insertion of IS elements (Tobes and 
Pareja 2006). Neisserial intergenic mosaic elements (NIME) 
sequences may be involved in silent pilin gene recombina- 
tion in N. meningitidis (Parkhill etal. 2000); these repeats are 
intimately associated with the pilE/S locus in a complex array 
of pilin genes and NIME sequences. The MITE-integron 
poses the question of a role in transfer of drug resistant 
genes. 

In terms of bacterial evolution, repeat sequences are 
found at sites of plasticity in the bacterial genome (Mine 
et al. 2009; Silby et al. 2009; Ogier et al. 201 0; Kristoffersen 
etal. 201 1) and again, they can affect plasticity by serving as 
sites for IS integration in the genome (Tobes and Pareja 
2006). In addition, mobile repeats may have influenced a cy- 
cle of active versus inactive genes during evolution (Fewer 
et al. 201 1). As repeat elements can be detrimental when 
incorporated into essential genes, evolutionarily there 
may have been a selection against Streptococcus sp. carry- 
ing a large number of mobile repeats, as current populations 
of Streptococcus appear to have fewer elements than their 
ancestors (Croucher et al. 201 1). 

Did repeat sequences and associated molecular/func- 
tional signatures evolve independently in different microor- 
ganisms or were they transferred by horizontal transfer? For 
some repeats evidence is consistent with an independent 
origin. The REP2 unit in Neisseria and the Borrelia 180 IR se- 
quences have negligible nucleotide sequence homology yet 
they both have similar structure/functional signatures and 
can be found immediately upstream of genes. MITEs in Neis- 
seria, E. coli, and Anabaena have similar overall MITE fea- 
tures, but core sequences show no similarities in nt 
sequence or structure/function motifs. MITE-like sequences 
straddle integrons in both Ent. cloacae and Acinetobacter 
sp. Although their integrases are homologous, the MITE se- 
quences show no similarities, and the internal structures of 



the integrons differ. This argues for an independent forma- 
tion of MITE-integrons in these species, as previously pro- 
posed (Gillings et al. 2009). 

How did these elements originate and how are they 
transferred? MITEs may have arisen by a selective conserva- 
tion of IS-specific IR sequences during decay of a transpos- 
able element. Lin et al. (2011) proposed that a group of 
MITEs in M. aeruginosa originated by deletion of the IS core 
that encodes the transposase gene. In Borrelia IR IS-specific 
sequences may have been duplicated or were selectively 
conserved during decay of the IS sequence and transferred 
to 3' end regions of putative lipoprotein genes (Delihas 
2009). The very unusual REP2 repeat sequences may have 
originated from an upstream regulatory region of a gene 
that included the 5' untranslated region and was subse- 
quently duplicated and transferred to other chromosomal 
locations. 

On mobility, MITEs can be transferred by a related trans- 
posase as exemplified by the in vivo transfer of the MITE-like 
sequence IMU by transposase (Poirel et al. 2009). By bioin- 
formatics analysis, the Nezha MITE was shown to be recently 
transferred between species (Zhou et al. 2008). Inverted re- 
peats of two MITE-like sequences in Pse. fluorescens are 
identical to the inverted repeats straddling the ends of IS el- 
ements present in the same organism (Silby et al. 2009), 
which hints at a transfer by the transposase. Thus evidence 
has accumulated to show or strongly suggest that many 
MITE sequences are mobilized by IS transposases. With re- 
spect to REP sequences, it has been hypothesized that the 
RATY may be responsible for the proliferation of REP units in 
the Stenotrophomonas chromosome (Nunvar et al. 2010). 

Several repeat seq ue nces have not been a na lyzed i n terms of 
possible function, for example, ATR, REP 3-5, (Parkhill et al. 
2000), and elements R0, R R2. R6, R178, and IR1_g (Silby 
et al. 2009). These may show additional intriguing properties. 
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